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Abstract 

The temporal exponential random graph model (TERGM) and the stochastic actor- 
oriented model (SAOM, e. g., SIENA) are popular models for longitudinal network 
analysis. We compare these models theoretically, via simulation, and through 
a real-data example in order to assess their relative strengths and weaknesses. 
Though we do not aim to make a general claim about either being superior to the 
other across all specifications, we highlight several theoretical differences the ana¬ 
lyst might consider and find that with some specifications the two models behave 
very similarly, while with others the TERGM out-predicts the SAOM. 
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1 Introduction 


When modeling longitudinally observed networks (e.g., panel network data), the analyst 
is generally confronted with a choice between two popular candidate models: the tern- 


Granmer and Desmarais 2016 a Desmarais and Granmer 

2010, 

2012c 

) and the stochastic 

actor-oriented model (SAOM) (Snijders 

2001 Snijders, Steglich and Schweinberger||2007 


mentation SIENA. An unusual, even surprising, feature of the literature on longitudinal 
network analysis is that authors tend to use one or the other of these techniques without 
explicitly considering or refuting equally applicable alternatives. This is noteworthy as 
previous studies show that the two models often yield considerably different parameter 


estimates based on identical model specihcations (Lerner et ah 2013). Even though the 


two models accomplish the same inferential task in a somewhat similar way, it is rare to 
see the theoretical fit or empirical performance of the two techniques directly compared 
(Desmarais and Cranmer |2 012a ) p| When theory is invoked with respect to model choice, 
it is usually done to justify the use of one of these network models relative to a regression 
model0 Yet, as we demonstrate below, it is not always the case that theory alone is 
strong enough to provide the analyst a clear preference for one or the other. So, what is 
one to do? 

Our goals here are to (a) carefully discuss the differences between the TERGM and 
SAOM such that the reader may be able to consider the interface between the two models 
and substantive theory in any given application area. We also (b) use two simulation 
studies to show that differences in the models’ assumptions and mechanics often result in 
signihcantly different substantive conclusions, even when the same theoretical specihca- 
tion is tested on the same data. We simulate two different types of known data-generating 
processes catering to the different assumptions each model makes. The results show that 
a violation of these assumptions generally leads to a larger discrepancy between actual 
and predicted edges and that one of the two models seems to be more sensitive to mis¬ 
application due to a more specihc temporal updating process. Further (c), we seek to 
demonstrate the utility of comparing the two competing models in application, as it is 
often unclear a priori which of the two models’ assumptions are met more exactly. To 
this end, we replicate a popular SAOM specihcation with a TERGM and compare the 
out-of-sample predictive performance of the models. The comparison illustrates that even 
in well-studied applications, the appropriateness of the respective model-related assump¬ 
tions can be hard to determine theoretically. Lastly (d), we provide a framework and 


Desmarais and Cranmer (2012a[) set out to develop and explain a blocked Gibbs sampling technique 


for interpreting ERGMs and TERGMs at the vertex, edge, or arbitrary subgraph level. As highly cited 
analyses with ERGMs were not to be found in the political science literature at that time, the authors 


demonstrated their technique by replicating an influential paper by Berardo and Scholz (2010), which 


had used a SAOM. Discrepancies in results and fit led to a brief discussion of model differences that was 


somewhat tangential to the goals of Desmarais and Cranmer (2012aI but provided the inspiration for 
the current work. 

^There is a substantial literature on why regression models are inappropriate for network data. That 
is not a subject we will examine here as it has already been well covered elsewhere. This literature begins 


with reviews of the work by Sir Francis Galton (1 

low et al.||1984) 

but also includes more recent work. 

such as 

Hoff, Raftery and Handcock ( 

2002) 

Hoff and Ward 

(2004 

, Goodreau, Kitts and Morris 

(2008), 

Lerner et al. 

(2013 

), and 

Cranmer and Desmarais 

(2011 

I- 
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companion software for easily comparing the out-of-sample predictive performance of the 
TERGM and SAOM. Thus, researchers whose theory does not provide sufficient guidance 
to select a model, or researchers seeking to evaluate the robustness of their results, need 
not rely on theory alone to judge which model hts best for a given application. 


2 Theoretical Comparison Between the Two Models 


2.1 Common Roots of the SAOM and the TERGM 

The inferential study of temporal dynamics in networks goes back to the 1970s and 


1980s, with pioneering work bj 

r Holland and Leinhardt 

(1977) 

, lacobucci and Wasserman 

00 

00 

T—1 

Robins and Pattison 

(2 

001) 

Hunger and Wasserman 

1980 

), and 

Wasserman and 


A natural starting point in comparing the TERGM and the SAOM is an account of 
their many similarities, followed by a comparison of their differences. The mathematical 
hearts of the TERGM and SAOM are very similar and are both related to the 


non¬ 


temporal) exponential random graph model (ERGM) (Wasserman and Pattison 1996). 
The models differ only in their sets of possible permutations, their treatment of temporal 
dynamics, and their estimation strategies (discussed in the next two sections). 

The (non-temporal) ERGM can be expressed by its probability density function 


p{N,d) = 


Eiv*eArexp{0^h(Ar*)} 


exp{0^h{N)}, 


( 1 ) 


where N is the observed network, 9 are the parameters, h(A^) is a vector of statistics 
computed on the network, and N* refers to a particular permutation of the network 
from the set of all possible permutations of the network holding the number of vertices 
as hxed A/". Here, a permutation is dehned as another network topology with the same 
number of vertices but not necessarily the same number of edges. Equation yields 
the exponentiated sum of weighted network statistics over the same sum for all networks 
that could have been observed in place of the network being considered in the numerator. 
A wide variety of endogenous dependence can be incorporated into the h(A^) vector of 
statistics, including reciprocity, edge-wise shared partners, four-cycles, and exogenous 
covariates, among many others (Morris, Handcock and Hunter 2008). The TERGM and 
the SAOM are both closely related to this dehnition, as will be explored below. Eor 


reviews of the ERGM, its specification, and its estimation, see, for example, Goodreau, 


Robins (2013). 


Kitts and Morris (2008), Granmer and Desmarais (2011), and Lusher, Koskinen and 


2.2 The Temporal Exponential Random Graph Model 

The TERGM builds on the ERGM by Erst defining the probability of a network at the 
current time step f as a function of not just sums of subgraph counts of the current 


^Both models are also related to the relational events model ( Butts|2d08 ), a third longitudinal network 
model that is applicable to temporally fine-grained sequences of edges. While all three models share the 
property that one can incorporate sufficient statistics for capturing network dependence, the REM is 
not considered further in our comparison because it is not applicable to panel network data. 
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network, but also previous networks up to some time step t — K: 




exp{0^h{N\ N^-\N^-^)) 


( 2 ) 


This assumes that the statistics formed based on the networks between t — K and t 
fully encompass the dependencies observed in the network at time t. The denominator 
contains the same kind of normalizing constant as in the ERGM. In a second step, one 
computes the product over all time periods in order to determine the probability of the 
time series of networks: 


P{N^+\...,N^\N\ 




K 


,«)= J] F(]V|iV 

t=K+l 


t-K 




(3) 


This is a simple extension of the ERGM to a series of networks. To incorporate tempo¬ 
ral dependencies between time steps, the network statistics dehned in h can incorporate 
“memory terms.” Leifeld, Granmer and Desmarais (2016a) provide an extensive discus¬ 


sion of this topic. The key is to specify a temporal statistic related endogenously to 
the structure of the network that captures the temporal process at work. A very simple 
example would be the dyadic stability term. 


hm — ^ ^ 


N. 


t-1 


+ (1 - 0(1 - O ) 


(4) 


which adds value to the statistic whenever the status of a dyad (tied or not) does 
not change from one period to the next. Because the temporal processes of interest will 
vary from application to application, it is fortunate that the analyst has a great degree 
of flexibility in the design of memory terms. The only real restriction is that they should 
be specihed as sums of subgraph products. As such, including terms to capture edge 
persistence, edge innovation, edge dissolution, delayed reciprocity, or arbitrary functions 
of time is possible ( |Hanneke, Fu and Xing||2010 Leifeld, Granmer and Desmarais||2016a ). 

This makes the TERGM a flexible model because there are no restrictions on the 
dependencies other than the length of the time series. The TERGM does not make any 
assumptions as to whether the time that passes between time steps is long or short, con¬ 
tinuous or discrete, and whether edges can be established sequentially or simultaneously 
in the data-generating process, as long as the outcome of this process can be mapped into 
a dependency term that can be incorporated into the h vector. This makes it very flexible 
because there are few restrictions on the h statistics, but it does not maintain a very close 
connection to the data-generating process at the micro-level, as will be discussed below. 

Estimation is possible using Markov chain Monte Garlo maximum likelihood estima- 


tion (MGMG-MLE) (Hanneke, Fu and Xing 201 

0) or maximum pseudolikleihood esti- 

mation with bootstrapped conhdence intervals ( 

Desmarais and Granmer] 2010 

2012c), 

as implemented in the btergm package i 

Leifeld, Granmer and Desmarais 20166) 

for the 

statistical computing environment R (^R Gore Tea 

m|2016). For a more detailed method- 

ological treatment of the TERGM, see 1 

danneke, Fu and Xing (2010), Leifeld, Granmer 

and Desmarais (2016a), Granmer and Desmarais ( 

2011), Desmarais and Granmer 

2012c), 

Granmer, Desmarais and Menninga (201/ 

),and G] 

'•anmer, Heinrich and Desmarais 

(2014). 
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2.3 The Stochastic Actor-Oriented Model 


To build a SAOM, one takes the initial observed network as given and models changes to 
it using a variant of Equation Q and a simulation process meant to mimic the evolution 
of the network between discretely observed time periods. Both the TERGM and SAOM 
include endogenous dependencies and exogenous covariates as subgraph products in the 
h(A^) term, in an almost identical fashion. Thus, the TERGM and SAOM can be char¬ 
acterized as very similar models. As will be made clear presently, the major difference 
between them stems from the updating processes the two models employ and the support 
of their probability distributions. 


The SAOM follows Holland and Leinhardt (1977) in positing that network change is 


a hrst-order Markov process, meaning that the network at time t is exclusively a function 
of the network at f — 1, a vector of endogenous statistics (h, which can be nearly identical 
to the h vector in the TERGM), and a corresponding set of parameters d that determine 
the evolution of the network between t — 1 and t. The SAOM posits that the time 
between t — 1 and t can be broken down into a possibly inhnite number of consecutive 
time steps—so-called mini-steps— during which vertices (often interpreted as actors) 
re-allocate their edges. At each of these mini-steps, two functions are executed: the 
rate-of-change function and the objective function. 

At each mini-step, the rate-of-change function hrst selects an actor at rate 


Vi 


Ai(iV*) = p,. 


(5) 


This rate, or waiting time, is a random Poisson process that assigns an equal probability 
of being selected to each vertex at a given mini-step. That is, each vertex has the same 
expected waiting time before it is selected (again). The rate function can be specihed 
in more elaborate ways such that vertices are activated with individual waiting times 
corresponding to functions of vertex covariates. When using a weighted rate-of-change 
function, characteristics affect the rate-of-change function multiplicatively through an 
exponential link function (Snijders, van de Bunt and Steglich 2010). This choice should 


be driven by theory; the simplest option is an equal probability with waiting times scaled 
to match the amount of change between the observed networks at t — 1 and t (Snijders 


2001). Details on the rate-of-change function can be found in Snijders (2001, 2005), 


Amati (2011), and Snijders and van Duijn (1997). 


Given that an actor has been selected at a mini-step, the objective function is executed. 
The actor selected may add a new edge to another vertex, remove an existing edge, or 
leave its edge prohle as it is. The choice of which outgoing dyad k to consider (from the 
perspective of vertex i) is guided by the function 




( 6 ) 


This function accommodates the same h statistics as the ERGM, except that SAOM 
statistics are computed from an egocentric perspective. For example, while the (T)ERGM 
would include reciprocity as the count over all ij dyads. 


hreciprocity.ergm ^ y 


(7) 
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the SAOM would include reciprocity from the perspective of vertex i, 

hreciprocity_saom(^) ^ 


( 8 ) 


Apart from this difference in the formulation of statistics, any network statistic that can 
be included in the ERGM can also be included in the SAOM. 

The decision which dyad to change, as given in Equation is made simultaneously 
with the decision on how to change the respective dyad. Intuitively, given that the 
objective function concerns an n-category unordered choice (consisting of the three types 
of actions: add or remove an edge; do nothing), the probability of a specific choice is 
computed in an identical fashion to that of a multinomial logistic regression: 


Pr{N,^) = 


exp(/i(0, AT)) 


Ev*eAtexp(/*(0,Ar*))' 


(9) 


Put more simply, “the probability that an actor makes a specific change is proportional 
to the exponential transformation of the objective function of the new network, that 
would be obtained as the consequence of making this change” (Snijders, van de Bunt and 


Steglich 2010[ p. 58). This may look identical to the probability dehnition of the ERGM 


in Equation [I] but there are two notable differences: M contains only the network states 
that are under direct influence of vertex i through the objective function (as opposed 
to all permutations in the ERGM), and the dependency terms hki{N) in fi{6,N) are 
formulated from the perspective of i (see Equations and |^ . 

The SAOM can be estimated using the (generalized) method of moments by means of 


a (modified) Robbins-Monro stochastic approximation algorithm (Snijders 


2011). The model was implemented in the R package RSiena (Ripley et al 


Boitmanis and Snijders 2013). 


2001 

Amati 

2016 

Ripley, 


As this discussion has shown, the SAOM and TERGM are similar in many respects 
related to their core equations and specihcations. Under a specihc and rarely applied rate 
function, the SAOM even has an ERGM as its limiting distribution (Snijders 2001). A 


comparison of Equations]^ andwith Equation [T] shows how both have strikingly similar 
mathematical cores with a nearly identical way of specifying sufficient network statistics. 


2.4 Differences between the TERGM and the SAOM 

The similarity between these models may lead to some confusion as to which model is 
preferable for a given application. In line with the “no-free-lunch theorem” (Boulesteix, 


Lauer and Eugster 2013), we believe that one cannot make a general claim about one of 


these models being generally superior to the other across contexts, but that the question 
of which model to choose must be addressed with respect to the specific application. 
Gonsequently, we review the assumptions of each model before considering the effects 
that these differences have on model performance. 

Before we do so, however, we note that the SAOM and the TERGM each have their 
own recent or ongoing extensions, which may offer advantages over each other in specific 
contexts. The SAOM, for example, is able to estimate multiple interdependent network 
processes in a single model ( Snijders, Lomi and Torl6||2013 ). There are variants for “mul¬ 
tiplex”, “multi-relational”, or “multi-level” ERGMs as well (Wang et al. 2013), but they 
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have not yet been extended to the temporal case. The ERGM has also seen the develop¬ 
ment of variants for weighted (valued) edges (e.g., the Generalized ERGM) ( [Desmar^ 
and Granmer 20126), and extensions to the temporal case should be straightforward. 


Here, we focus on the simplest case: a comparison between the SAOM and the TERGM 


for a single, binary network over multiple time periods as described by Leifeld, Granmer 


and Desmarais 

2016a|) and implemented in Hanneke, Fu and Xing 

(2010 

); Leifeld, Gran- 

mer and Desmarais (2016 c 

0; Desmarais and Granmer 

(2010, 2012c 

for the TERGM and 

as described in 

Snijders ( 

2001) 

Snijders, Steglich and Schweinberger ( 

2007); Snijders, 

van de Bunt and Steglich 

2010 

and implemented in 

Tipley et al. 

(2016 

for the SAOM. 


We consider these the canonical forms of the two models and limit our discussion to 
them. Gontrasting the various extensions of these models is left to future research. 


Primacy of Actors and Micro-Level Modeling 

One may be tempted, based on the difference in names, to think that the major theoretical 
difference between the SAOM and TERGM relates to the primacy of actors (the vertices 
of the network). In fact, the major difference lies not so much in the primacy of actors, 
but more in the timing and updating processes assumed in the two models. However, it 
is natural to begin by clarifying the extent to which the two models place primacy on 
the actors. 

The TERGM, in its basic formulation, is a model of the edges in a network. These 
edges are modeled as a function of the topology of the network itself, exogenous relational 
covariates, and exogenous vertex (actor) attributes, either concurrently or as a function 
of past states of the network (see Equation]^. As a consequence, the TERGM assumes 
little in the way of actor agency; the TERGM is consistent with more detailed models 
that provide for high degrees of agency from the actors, but it is also consistent with 
models that assume no agency at all. In other words, the TERGM has little to say by 
virtue of its basic mathematics about the primacy of actors or their agency, but it can be 
adjusted, through the dependence statistics and covariates included in its specihcation, 
to model the structure of the network as a function of actor-centric processes. 

For example, conflicts in the international system can be modeled by describing struc¬ 
tural effects like two-stars (an edge- or subgraph-level interpretation), but these counts of 
subgraph products can be interpreted from an individual actor perspective. For instance, 
one could posit that one state attacks another state because it has been attacked by a 
third party before (preferential attachment), thus leading to a topology where two-stars 
are prevalent. The TERGM is therefore compatible with actor-based theories, but it does 
not assume per se that actors make these decisions. For instance, one could also apply 
a TERGM to the changing topology of electrical power networks and model endogenous 
features like two-paths without invoking theory related to the primacy of actors. 

The SAOM is explicitly actor-centric, thus its namesake as an “actor-oriented” model, 
in the sense that one might regard the determination of outgoing edges as part of the 
behavior of the actor. The SAOM includes two processes, related to the timing and 
nature of edge changes, that are built specihcally around the agency assumed of the 
actors (Snijders, van de Bunt and Steglich 2010) (see Equations and [^. Logically, 


these stochastic processes assume the agency of the actor: that the actor has the ability 
and motivation to change his/her/its edges. We will discuss these two processes again 
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below, as they relate more to timing and edge npdating than actors. What is important 
to make clear here is that the SAOM is not actor-oriented in the sense that the ontcome of 
interest is an actor-level variable; like with the TERGM, edges and the network topology 
resulting from these edges are the outcomes of interest. The SAOM is a model of the 
changes in relationships between actors (thus, a model of the network and not a model 
of the actors) that is conditioned by specific assumptions about how actors behave]^ 

It is also important to note that both models can contain identical sets of model terms 
in their respective h vectors. It may feel more intuitive to use the SAOM if one tests a 
theory that is based on methodological individualism. Yet, both models permit identical 
theoretical processes to be modeled, and any differences in theory are either related 
merely to a subjective interpretation one attaches to the equations, or they are related 
to the mini-step updating process of the SAOM that is absent in the TERGM. Therefore 
the remaining comparison of the two models will focus on the mini-step updating process 
inherent in the SAOM and absent in the TERGM. 


Temporal Updating Process, Sequentiality, and Multi-Party Events 


For both the TERGM and SAOM, the network of interest is observed at several distinct, 
discrete time points, but one does not observe what happened in the network between 
observations. For example, assume a friendship network is sampled from a school on 
the last day of the school-year for four years. We observe four discrete networks, but 
do not know how the network changed between points in time. Some friendship edges 
observed at time t may have dissolved by time t + 1, and others may have dissolved 
and re-formed during this time. Depending on how often network edges change and 
how often the network is observed—properties that will vary greatly from application 
to application—, quite a bit of change may go unobserved between network snapshots. 
Often, one models networks at distinct time points because this is more convenient or 
cost-efficient than observing data on a continuous basis. In these cases, the network at 
time t usually depends increasingly on the network at f — 1 the shorter the time span 
between the observed time points. 

The SAOM deals with this variability in temporal dependence by determining the 
required number of mini-steps as a function of the amount of change between the two 
consecutively observed networks (Amati 2011: 147). That is, the simulations stop when 


as many edges have been changed as between the observed networks (with unconditional 
estimation) or after a pre-defined number of mini-steps reflecting the amount of change 


between the hrst and the second network (with conditional estimation) (Snijders 


The TERGM, in contrast, is not built around the idea of modeling plausible paths 
between the observed time steps. It rather models only the data that are observed. In 
its simplest version, the TERGM is merely a pooled ERGM without any temporal de¬ 
pendence, thus assuming independence between consecutive networks. In most temporal 
settings, independence between time steps is unrealistic, therefore the user needs to think 
about how the previous network influences the current network. Instead of modeling this 
transition using mini-steps as in the SAOM, however, the outcome of this transition is 
mapped into the vector of user-supplied network statistics when the TERGM is used. 


^We consider the original formulation of the SAOM as a model of network change here. For a 


discussion on SAOM variants pertaining to the behavior of actors, see Section 2.4 
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For example, suppose that preferential attachment shall be tested. In a SAOM, one 
would test for two-stars or indegree popularity over the (unobserved) sequence of mini¬ 
steps to establish whether preferential attachment is a plausible mechanism given the two 
observed time points. In a TERGM, one would test either for two-stars within the second 
time step or for two-stars over time (i. e., where the hrst edge is observed at the hrst and 
the second edge is observed at the second time step). Only in the second variant would 
the TERGM assume that the second edge occurred after the first edge of the respective 
two-star. In the first variant, simultaneous formation of both edges is equally plausible 
as a temporal order of the two edges. 

One may be tempted to argue, based on this comparison, that the SAOM is better 
able to operationalize the concept of preferential attachment here because it posits a more 
hne-grained temporal updating process. Yet, this requires that the mini-steps accurately 
reflect the true temporal development on average, which will be tested below. Another 
way of putting it is that the SAOM creates artihcial data to test a micro-level theory, but 
at the end of the day these artihcial data can only encode information that is actually 
observed in the two observed time steps, therefore it cannot completely rule out that the 
edges are actually formed simultaneously or in reverse order (that is, in a way that is 
accounted for in the seemingly less specihc TERGM). 

This prompts an interesting philosophical question on the temporal order of social 
processes. One may argue that true simultaneity never occurs in physical systems and 
consequently does not occur in social systems either. Yet, a distinction needs to be made 
between absolute time and measurable time from the point of view of the entities that 
cause edge formation. Eor instance, in an e-mail communication network, sending an 
e-mail to two recipients on a single press of a button occurs at two consecutive time 
points because the server does not process the two recipients truly simultaneously. But 
from a theory perspective, this hne-grained temporal order does not matter because the 
sender does not make two separate decisions, but rather one joint decision to send out 
two e-mails, and the two e-mails are visible to the recipients (and possibly others) with 
such a small time interval between them that they are given the chance to respond to 
these e-mails as if they had arrived simultaneously. 

An area of application in which simultaneity as perceived by actors matters is col¬ 
lective action theory. While true simultaneity is rarely achieved, theory assumes that 
groups collectively engage in non-action until the incentive structures are changed such 
that all actors have an incentive to become active at once. In network terms, it matters 
from a theoretical perspective whether the dissolution of an edge instantaneously creates 
an incentive for a group of actors to form edges or whether the dissolution of an edge 
leads to a single new edge somewhere in the network, that edge creates an incentive for 
somebody else to create another edge, and so on in a Markovian updating process. The 
SAOM assumes such a sequentiality in the temporal updating process while the TERGM 
remains agnostic as to whether edges were formed in a sequence or simultaneously as 
long as the statistics are found in the outcome network. 

As a consequence, one can expect theories that posit simultaneity of multi-party 
events—such as theories building on collective action—to be more compatible with the 
TERGM while theories positing sequential edge updating should be equally compatible 
with both models. This should lead to different parameter estimates in the former but 
not in the latter case. The difference is not due to the sequentiality of the SAOM edge 
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updating process per se, but rather due to the fact that the rate-of-change function may 
impose quite a long waiting time between edge events that are supposed to be simulta¬ 
neous, thereby changing the network to some extent between supposedly simultaneous 
moves. 

As a case in point, consider the network of international defensive alliances, where 
NATO was initially formed in a single, coordinated treaty. Had opponents of NATO 
been allowed to react between the individual formation steps of the prospective NATO 
members, this may have deterred the founding states from completing the formation of 
NATO. In short, we expect the mini-step sequences of the SAOM to depart from the 
true data-generating process in applications that are based on collective action situations 
and other simultaneous edge formation or dissolution behavior. In other applications— 
perhaps friendship networks—sequentiality may or may not be a plausible assumption. 

It is important to acknowledge that the expected difference is merely due to the 
temporal updating process assumed by the SAOM (and not assumed by the TERGM), 
not due to any user-supplied statistics, which can be identical. On the other hand, 
however, because the assumptions made by the TERGM are less specific, the analyst 
may face an increased burden in the specification of the model because she may need to 
add sufficient structure to the TERGM for it to match her hypothesized data-generating 
process. 


Conditionality on Previous Time Points and Goodness of Fit 


There are two principal ways in which the SAOM and TERGM differ with respect to 
changes in the configuration of the network between observed time points. The first is 
simplest. The SAOM takes the first network to be given (not modeled) and then models 
changes from that base network going forward ( Lusher, Koskinen and Robins||2013 ). It is 
natively a model of changes in networks. The TERGM, which can be said to follow the 
core ERGM more closely, is a model of the network itself, rather than changes to it. The 
TERGM conditions on at least one previous network but natively models the structure of 
the network over time, rather than changes to it. That said, the TERGM can be adjusted 
to closely mimic the SAOM in the modeling of changes by including a “memory” term, 
which indicates whether a given edge existed in the previous time period (for an extended 
discussion of memory terms, see Leifeld, Granmer and Desmaraisj2016a ). In the TERGM, 
one can control for the effect of the previous network on the current network and then 
proceed by interpreting the parameters of the remaining statistics as network effects 
conditional on the previous state of the network. The SAOM conditions on the previous 
network by letting the mini-step process depart from the previous network. While the 
amount of network change in the TERGM is visible in the parameter of the model term 
that conditions on the previous network, the amount of change in the SAOM is visible 
in the rate parameter(s). 

Some may see the inclusion of such a memory term as advantageous because it makes 
the autoregressive process more transparent, whereas this process is obscured somewhat 
by the SAOM’s updating procedure. Others may find the continuous-time structure 
of the SAOM advantageous for modeling a process that unfolds in continuous time (or 
nearly so) and is observed “as snapshots” at some moments, especially considering that 
the population parameters of the TERGM are sensitive to the interval of observation. 
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Ultimately, these are merely differences in the interpretation of the parameters. If the 
TERGM employs a memory term for conditioning on the previous network in lieu of 
the rate parameter of the SAOM, these differences in interpretation should not affect 
the fit of the remaining parameters in a given model if the same h statistics are used to 
construct the model. 

The two models also differ with respect to the role of observation frequencies or inter¬ 
observation times. Because the SAOM provides a data-generating process as part of the 
model, whereas the TERGM simply conditions on a previous network or networks, the 
parameter values of the SAOM that are not related to the rate-of-change function do 
not depend on the observation times or observation frequencies. TERGM parameters for 
model terms that posit dependence between time steps depend on the inter-observation 
times, potentially strongly. For example, consider a time series of yearly observations 
for 20 years. Modeling the series at yearly intervals with the TERGM will give different 
results for the temporal dependency terms than modeling the series at intervals of two 
or five years because the population parameters are different, not just because there 
are fewer data and because of random variability (which would also affect results). For 
the SAOM, the population parameters are invariant, given the rate parameter. As such, 
changing the period of observation for a SAOM would only affect the parameters through 
sparser data and random noise. 

While independence of the time range between two observation points may be seen as 
an advantage in the interpretation of effects in the SAOM, the flipside of this argument 
is that the role of the previous network(s) can be interpreted in a more transparent 
way in the TERGM precisely because the temporal dependency statistics (e.g., delayed 
reciprocity or memory terms) reflect the degree of dependence on previous time points in 
the parameter estimates. For example, with very short inter-observation time intervals, 
one can expect little change to occur such that temporal dependencies as control variables 
yield large estimates in the TERGM. On the one hand, this may be mistaken for a good 
model fit if the inter-observation times are small, so interpretation of overall fit must take 
into account the time span between observations (as in any model with lagged outcome 
variables). On the other hand, these model terms give an accurate representation of the 
degree of dependence on the previous time point (s) while the importance of the previous 
time point in the SAOM can only be evaluated by interpreting the rate parameters. Yet 
these parameters only represent the overall amount of change in a network rather than 
permitting parameterization of the relevant aspects of change as in some TERGM model 
terms, such as delayed reciprocity, delayed triadic closure, or dyadic stability. If inter¬ 
observation times are small in the SAOM, this does not artificially inflate the goodness 
of fit of the model. However, as the role of the previous network for the current network 
is not entirely clear from the estimates, there is a risk of misattributing variation in 
the network to h statistics that would be more easily explained by path dependency. 
Therefore one may argue that the TERGM does not artificially inflate model fit, but 
rightfully attributes importance to the role of the previous network(s) rather than the 
updating process. 
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Longer-Term Temporal Dependence 


The TERGM assumes that temporal dependence occurs up to K time steps in the past, 
and it requires the user to specify these dependencies. For example, it may be realistic 
that wars hve to ten years ago impact wars in the current year, in addition to the wars 
just in the previous year. Depending on the application, the need to specify the dynamic 
process for the TERGM may raise the burden of specification on the analyst. 

The SAOM assumes that dependence on events before f — 1 is completely factored 
into the previously observed network (f — 1). That is, it assumes a first-order Markov 
process. This places a lighter burden on the modeler with regard to model specification, 
but there may be cases where events that happened multiple time steps ago may not be 
captured by controlling for events at the previous time step. 

As Desmarais and Granmer (2012a) note, these different specihcations result in subtly 
but importantly different distributions assumed by the two models: TERGMs take the 
hrst k networks to be given and do not model them, thus the networks that are modeled 
(those occurring after k periods) are assumed to be in their stationary distribution con¬ 
ditional on the first k. The SAOM takes the first network as given, and models changes 
thereafter, thus following a stationary stochastic change process rather than a stationary 
stochastic series of graphs. 

To make the contrast in basic assumptions clearer, consider some specific conditions 
for the two models. For the SAOM, the usual procedure for generating a sample is the 
updating process described in Section 2^, whereas for the TERGM samples are usually 
generated through the indehnitely continued edge-wise updating of a Gibbs sampler. For 
the SAOM, sufficient conditions are its stochastic updating process (the usual proce¬ 
dure for generating a sample), and (more restrictive) myopic stochastic optimization by 
actors. For the TERGM, sufficient conditions are indehnitely continued edge-wise updat¬ 
ing according to Gibbs sampling by edge variables (the usual procedure for generating a 
sample), and indehnitely continued myopic stochastic optimization by edge variables. 


Actor Homogeneity, Strategic Behavior, and Size Limitations 

Both models assume that there is a homogeneous process operating on the network. 
More specihcally, the SAOM posits that all actors have the same objective function 
and thus exhibit the same social behavior (on average): they wish to maximize their 
values of the network statistics included on the right-hand side of the specihcation. The 
TERGM makes a similar assumption by positing a single joint data-generating process 
over all dyads in the network. The differences only affect the interpretation, but not the 
underlying mathematics: while both models assume homogeneity of the process over the 
whole network, this homogeneity does not necessarily have to be interpreted as actor 
homogeneity in the TERGM. 

In the TERGM, the user models counts of subgraph products. As pointed out in 
the subsection on temporal updating above, the TERGM is compatible with an actor 
perspective, but this perspective needs to be operationalized into countable edge-based 
network statistics. Therefore, rather than assuming that all actors have the same goals 
as in the SAOM, the TERGM assumes that the distribution of subgraph counts cap¬ 
tures potential heterogeneity between vertices, thereby deviating from an actor-oriented 
perspective when necessary. 
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However, the SAOM can accomplish exactly the same task because one can specify 
identical subgraph products (read: h statistics) in both models. For example, if one 
models the number of edge-wise shared partners per dyad in a TERGM, the distribution 
of edge-wise shared partners per dyad reflects heterogeneity among actors or dyads, but 
on average each dyad has the same probability to be located at a certain point in this 
distribution, which is captured by the estimate for the respective statistic. Similarly, the 
SAOM estimate posits the same tendency of actors to engage in edge-related behavior 
leading to the same outcome distribution of the statistic, and this distribution captures 
heterogeneity in the network in the same way as in the TERGM. 

That said, in both models it is possible to model actor heterogeneity explicitly by 
adding vertex-level covariates (e. g., actor type) and interacting them with other elements 
of the h(A^) vector. In this case, however, one dehnes exogenously where the heterogeneity 
is located in the graph. 

The actor homogeneity assumption built into the SAOM has two consequences. First, 
the SAOM and the TERGM differ subtly with regard to their applicability to rational- 
choice theories. Many theories in the social sciences assume rational or otherwise forward- 
looking actors (e.g., game-theoretical models). In the SAOM, when actors evaluate their 
objective function at a given mini-step, they cannot “look down the game tree.” That 
is, the SAOM assumes that actors compare the network with singly permuted (one-edge 
changed) networks, not the long-run change that will result from immediate changes. As 
such, the updating process posited by the SAOM is theoretically incompatible with most 
rational-choice theories. Future research may try to tackle this problem by proposing 
SAOM variants that use a multinomial choice model with alternatives being evaluated 
by backward induction of user-specified games in lieu of Equation Right now, the 
mini-step simulation process is likely led astray in many empirical applications because 
actors do not evaluate the responses of other actors to their actions. This may lead to 
omitted variable bias by design in situations where strategic actions matter. 

The TERGM does not have provisions for strategic behavior built into the model 
either and thus potentially suffers from the same problem. Yet it is unclear whether the 
problem is equally severe in both models since the TERGM does not have a mini-step 
updating process. A potential solution may be to compute the equilibrium of the game 
one wants to model and including it as a network statistic. This of course suggests that 
one does not model the strategic interaction at the micro-level (i.e., every interaction 
in the game), but rather tests whether the outcome of this local process is present on 
average. 

Neither of the two models has provisions for these kinds of extensions at this point, 
but they may be easier to implement in the TERGM from a user perspective because 
only the h(Y) vector needs to be adjusted, not the temporal updating process built into 
the statistical model. In principle, the same equilibrium statistics could be included in 
the SAOM, but the interpretation would differ: theoretically, this would mean positing 
that actors homogeneously work together towards achieving the equilibrium, even though 
the equilibrium may actually be the result of a non-cooperative process that crucially 
rests on the assumption of heterogeneity of preferences. Therefore, while both models 
may or may not be able to arrive at the same estimates in game-theoretic situations, the 
usual actor-level interpretation of the SAOM may be confusing when applied to strategic 
situations while the TERGM does not have to be interpreted from a micro-level actor 
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perspective. It is also unknown to what extent forward-looking behavior is prevalent in 
social networks, and it may vary considerably by subject area. 

Finally, the homogeneity assumption may have consequences for the size of networks 
that can be modeled with each technique. The updating process of the SAOM assumes 
that all actors, when they have a chance to update their edge prohle, observe the en¬ 
tire topology of the network in order to make their updating decisions. While this is 
reasonable in very small groups, it is hard to imagine that actors can know the entire 
topology of even medium-sized networks. Given that there are 2(2) possible directed 
connections in a network, where n is the number of vertices, one is tempted to think that 
the assumption of actors knowing the network topography is only valid for very small 
networks. Indeed, if there are only 150 actors in a network (e.g., a very small school), 
that implies 11,175 edges (or lack thereof) that an actor would have to be aware of and 
properly account for in order to use the objective function to update his/her edge prohle. 

While neither model is computationally capable of estimating models on very large 
networks, both can estimate models with thousands of vertices. However, the reasonable¬ 
ness of the SAOM assumption seems to hold only for small networks, thereby imposing 
an unspecihed size limitation by theory. Users can estimate a SAOM even if this theoret¬ 
ical assumption is not met, but it is yet unclear if this leads to biased estimates because 
the Markov simulations are based on an unrealistic process. In the TERGM, in contrast, 
there is no theory that specihes such a size limitation. For example, if the user speci- 
hes a preferential attachment mechanism by including temporally delayed two-stars, this 
statistic is counted across the whole network without assuming that actors consider all 
other actors’ edges and non-edges, he., the TERGM cannot depart from a theoretically 
specihed Markovian updating process because it is agnostic as to whether such a process 
caused the transition from t — 1 to t. In the SAOM, the specihcation of such a model 
term assumes that actors make a choice over all other actors and their inter-connections 
in the network and thereby rely on perfect information. 

What happens if the SAOM is applied to an empirical case where individuals cannot 
observe all other actors’ relations? Future research will need to evaluate whether a vio¬ 
lation of these theoretical assumptions in the respective models leads to biased estimates 
or whether these are merely differences in interpretation. While we cannot offer a com¬ 
prehensive and nuanced comparison that takes into account all the differences pointed 
out in this review, we seek to illustrate the need for more focused comparisons below 
by constrasting two extreme cases that are maximally compatible with the respective 
assumptions of the SAOM and the TERGM and by demonstrating that the plausibility 
of a model’s theoretical assumptions is consequential for its performance. 


Behavioral Component and Estimation 


One of the great innovations of the SAOM, and an often-cited reason for using it (Veen- 


stra et al. 2013), is that it includes the option to model a dynamic behavior (vertex 


attribute) “simultaneously” with the evolution of the network. Desmarais and Cranmer 


(2012a) showed that the TERGM can also incorporate a behavioral component, though 
the dynamics of the two work somewhat differently. Both behavioral components are 
statistical models for the vertex attribute at time t that include, on their right hand 
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sides, other vertex attributes and vertex-level measurements on the network (e.g., the 
centrality of the vertex). 

Where the two behavioral components differ is that the SAOM iteratively estimates 
the behavioral model for each of the simulations between time periods conditioning the 
behavioral model on the most recent simulation values available. The TERGM simply 
employs a (generalized) linear regression and conditions on the previous time point. It is 
also the case that, if actor behavior is being modeled as part of the network-generating 
process as in the SAOM, simulated intermediate vertex behavior will also be accounted 
for in the network-generating model ( |Steglich, Snijders and Pearson 2010). Though the 
iterative simulations can imply a more immediate effect of previously observed networks 
on behavior, these models do not incorporate any simultaneous dependence between the 
network and behavior. 

Thus, it is fair to characterize the behavioral component of the SAOM as more dy¬ 
namic than that included in the TERGM. However, it is also unclear a priori which 
formulation will perform better in explaining and predicting vertex attributes. Whether 
or not a behavioral model that incorporates simulated intermediary steps rather than 
simply conditioning on observed behavior is more realistic will be entirely dependent on 
the accuracy of the simulation process. If the edge-updating process described above is 
out of step with the data-generating process, the more dynamic process may actually 
perform worse than the simpler behavioral model. 

We leave the comparison and evaluation of model components for vertex attributes to 
future research and focus on the network aspects of the two models. Both models have 
these capabilities, but like the network parts, they differ substantially in their assumed 
updating processes. 

Moreover, as outlined above, the two models have different estimation strategies. 
Though each of the estimation procedures is different and may require care to different 
things (e.g., good mixing and convergence with Markov chains), they should all provide 
valid estimates of any given model. As such, we see the topic of estimation as some¬ 
what beside the point of the present analysis: unless any of the approaches are outright 
wrong—and each has a substantial literature behind it—then it should not matter which 
particular strategy is used. Put another way, if we are willing to assume the validity 
of each of these estimation strategies, any differences in £t and performance between a 
TERGM and a SAOM with comparable specifications will be the result of the model 


structure. 

The TERGM and SAOM do, however, share a common limitation with respect to 
estimation: estimation of both models is computationally intensive and time consuming. 
As a result, both models tend not to be estimable in short time periods (weeks/months) 
for very large networks. As increases in computational power meet efficiencies in coding 
and advances in parallelization, this restriction is weakening over time. However, as of 
this writing, neither model is easy to estimate as the number of vertices and temporal 
observations increase. 
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3 Simulation-Based Comparison 


3.1 Theoretical Expectations 


The previous section considered a number of differences in the theoretical assumptions 
of the TERGM and the SAOM, despite the many similarities of the two models. It is yet 
unclear in how far these theoretical differences also translate into a difference between 
estimated parameters and the parameters of the true data-generating process when the 
theoretical assumptions are not met by the data. 

Here, we select two critical cases for comparison, one with a data-generating process 
that is maximally compatible with the SAOM and one that is maximally compatible 
with the TERGM]^ Our expectation from the theoretical comparison is that the perfor¬ 
mance of the SAOM should be worse than the performance of the TERGM when the 
data-generating process is a TERGM updating process as generated using a TERGM 
equation because some of the assumptions made by the SAOM may be violated. We 
also expect that the performance of the SAOM and the TERGM should be quite sim¬ 
ilar when the data-generating process is a SAOM updating process as generated using 
a rate and objective function because the TERGM is compatible with many kinds of 
data-generating processes including the assumptions made by the SAOM. 

More specifically, for the SAOM process, we expect that a Markovian data-generating 
process without strategic action in a moderately-sized network without simultaneous edge 
formation should be recovered equally well by the SAOM and the TERGM because such 
a process meets all theoretical assumptions of the SAOM and because the TERGM is also 
compatible with these assumptions. We simulate such a process using the NetSim package 
( Stadtfeld||2015 ) for the statistical programming environment R ( R Gore Team||2016| . For 
the TERGM process, we expect than once we do not impose a continuous-time sequential 
edge updating process by actors, but rather a TERGM process determined by discrete 
updating between time steps and from an edge-based perspective using a Metropolis- 
Hastings network sampler as implemented in the ergm package (Hunter et al. 2008), the 
TERGM should be better able to recover the true parameters of the data-generating 
process because such a process violates a number of SAOM assumptions. 

This comparison of two extreme cases—one directly using the data generating process 
of the SAOM and the other of the TERGM—serves to illustrate that these differences 
do matter and that one should pay careful attention to select an appropriate statistical 
technique in applied research. If we do find differences between the two models based on 
the two different processes, this may suggest that empirical networks may also vary on 
this continuum or even beyond, and this will pave the way for a research agenda where 
more focused comparisons can be conducted. 


®Note that one might also approach this question by conducting simulation experiments in order to 
test each of the identified differences for an impact on model performance. For example, one could test 
whether increasing levels of edge simultaneity in the data-generating process lead to a deterioration of 
model performance in the SAOM and in the TERGM, respectively, all else being equal. Our theoretical 
comparison suggests that both models should fit equally well if all edges are sequentially established while 
the TERGM should have a relative advantage with increasing prevalence of multi-party edge events or 
incentives for collective action. However, so many separate simulation experiments would far exceed the 
page length boundaries afforded by a journal article. 
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Further, note that while our goal in the theoretical discussion above was to discuss the 
TERGM and SAOM independently of their software implementations, software necessar¬ 
ily plays more of a role in this performance-based exercise. A model is certainly not equal 
to its software implementation, but the constraints and options of the software do affect 
the performance of the model on real data. We try, in what follows, to be as transparent 
as possible about what is a software limitation and what is a model limitation. 


3.2 Out-of-Sample Prediction 


We select the most impartial criterion for inter-model comparison possible: out-of-sample 
predictive performance. Out-of-sample prediction involves the htting of a model on one 
dataset, called the training set, and the application of that htted model to a different 
dataset, called the test set. This is a powerful framework for performance testing because, 
as long as there is no overlap between the training and test sets, the only thing they will 
have in common is the data-generating process (DGP). Out-of-sample prediction is an 
impartial criterion because (a) it is easily comparable across methodologies, whereas 
other £t criteria are not; (b) in-sample prediction runs the risk of overfitting, where the 
parameters of the models might capture idiosyncrasies of the sample rather than the 
DGpg and (c) out-of-sample prediction ensures that the model that best captures the 
DGP in nature will perform best. 

Much of what follows may seem evident from other areas of statistics. However, as 
there is little published work detailing the use of out-of-sample prediction with these 
longitudinal network models, we take the time to consider some of the details of this 
methodology. In longitudinal network models, a training set of older networks is used to 
predict a test set of newer networks in the time series. Prior applications of out-of-sample 


prediction to SAOMs include Desmarais and Granmer (2012a), Kinne (2013), Koskinen 


and Edling (2012), and Warren (2016). There is also a substantial literature looking at 


out-of-sample prediction with other network models (e.g., ( 

Dhiba, Metternich and Ward 

2015 

Montgomery, Hollenbach and Ward 

2015 

Ward et ah 

2013 

)• 


Temporal Heterogeneity 

One may object that the DGP may change between the training and test sets, rendering 
the predictive exercise meaningless. If the DGP changes substantially between the train¬ 
ing set and the test sets, one should indeed expect to see poor out-of-sample predictive 
performance. However, it is also the case that the very goal of longitudinal network mod¬ 
els is to describe a joint process taking place across multiple time steps. As such, if the 
longitudinal network model is applied correctly, such out-of-sample testing should not 
be problematic because the DGP should not be changing. If the DGP were changing, 
the use of a longitudinal network model, either the SAOM or the TERGM, would be 
problematic. If it is sensible to assume that the same model can be applied to all waves 
of a network, then it should also be sensible to estimate the same model on the first k 
waves and test it on the {k + 1)**^ wave using out-of-sample prediction since the data- 

model that is overfitted will predict well in-sample (e. g., on the training set), but poorly out-of- 
sample (e.g., on the test set). 
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generating process does not change. Relatedly, if the model performs well ont-of-sample, 
this is strong evidence that the DGP has not changed. 

Prediction or Explanation? 

Ultimately, most analyses are interested in explaining network phenomena rather than 
predicting them. It might seem then that a predictive exercise snch as onrs is assessing 
the model on the wrong criteria. We disagree however. Thongh we nse onr models to 
predict, they are not specified atheoretically snch that their specification is nnrelated to 
an explanatory model. Any model specified to explain data from a specific DGP shonld 
be applicable to new data that emerge from the same DGP. As long as the DGP is not 
expected to change, as discnssed above, testing an explanatory model ont of sample is a 
powerfnl means by which to jndge the extent to which it actnally captnres the DGP. 


Predict Network Structure or Edge Probability? 


It is not immediately clear whether a predictive exercise for network models should be 
attempting to predict the frequency of occurrence of certain network structures (e. g., 
closed triads) ( [Hunter, Goodreau and Handcock 2012) or predict the formation/persis¬ 
tence of dyadic edges. Our solution to this is to examine predictive performance on 
both, endogenous network characteristics and the location of edges in the graph. As the 
literature does not host much discussion on this topic, this needs elaboration: Though 
it may seem very natural to attempt the prediction of edges—indeed, such would be 
the typical approach in statistics and the voluminous literature in computer science—, 
there is a case to be made against this approach. Gonsider an extreme case where the 
network is generated only by endogenous network structure and not at all by exogenous 
covariates. In such a case, the labels of the vertices should not matter and therefore edge 
prediction should not be informative. That is, one would not expect the predictive per¬ 
formance of the model, at the edge level, to differ from random, but one may be able to 
predict structures, such as the number of closed triads, with substantial accuracy. This 
is the reason that software packages such as statnet (Handcock et al.||2008 Hunter et ah 


2008) and RSiena (Ripley et ah 2016 Snijders, van de Bunt and Steglich 2010) focus 


their goodness-of-fit tests on network structures rather than dyadic edge predictions. 

However, this extreme case is misleading as the order of vertex labels will matter in 
most empirical applications. On the one hand, most models will feature some exogenous 
dyadic or vertex-level covariates, which makes it important to predict edges between 
those vertices where the variation occurs. On the other hand, the absolute location of 
edges and subgraph structures in the network is especially important in longitudinal 
network models, such as the ones considered here, because otherwise local structures 
cannot be carried forward from one time step to the next. It would be highly unrealistic 
to assume that structures are sticky and change only as a Markov process during the 
mini-step updating, but then suddenly change their location in the network completely 
when the next time period starts. Therefore most useful applications feature some degree 
of path-dependence of the absolute location of edges, and it is an important consideration 
who is connected, rather than merely to what extent edges or certain structures show 
up somewhere in the network. For this reason, both measures are important aspects of 
model fit: auxiliary statistics capturing certain aspects of network structure as a measure 
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of the endogenous goodness of fit and the dyadic prediction performance as a measure of 
the extent to which edges are predicted in the right “spot” in the network. 

Furthermore, as the theory becomes more and more about exogenous factors, the 
more useful edge prediction becomes. Consider the opposite extreme to that considered 
above: a scholar has a theory about how exogenous factors cause edges to form in a 
given network. He/she models this process with a TERGM or SAOM because he/she is 
worried about controlling for potential endogenous dependencies, even though they are 
not of primary theoretical interest. In such a case, the dyad-wise predictive accuracy of 
the model would be paramount, and the extent to which the model predicts structures 
would be important only to satisfy the scholar that he/she had appropriately modeled 
any endogenous processes that may be at work. 

As such, we believe both types of goodness-of-£t assessment have merit and are actu¬ 
ally complementary. It is further important to keep in mind that both types of assessment 
perform well with out-of-sample prediction (as well as in-sample prediction). 


3.3 Predictive Methodology 


We generate the out-of-sample predictions in the following way. For each of the below 
applications, we predict the final observation of the network using only observations that 
occur temporally prior. So, for both the TERGM and the SAOM, the first network is 
not modeled (it is used for temporal conditioning by both models) and the last network 
in the series is not modeled either because it is the object of prediction. 

For example, in a network that is observed at four time points, a TERGM for t = 2 
and f = 3 is estimated based on the corresponding covariates at t = 1 and f = 2, 
respectively. These covariates can be functions of the previous network, and in the cases 
we report below, we use a dyadic stability term that captures to what extent edges and 
non-edges are sticky over time. Once such models have been estimated, their resulting 
coefficients are used to simulate several new networks based on the covariates at t = 3, 
and the simulations created in lieu of time point 4 are compared to the observed network 
at t = 4. If the simulations predict the empirically observed network at time step 4 well, 
we can conclude that the out-of-sample predictive performance of the TERGM is good. 

The procedure for SAOMs is analogous to that of the TERGM: we let the SAOM 
simulate edge changes and estimate parameters between t = 1 and t = 2 and between 
t = 2 and t = 3. Then we extract the estimated parameters from the model and simulate 
a SAOM process forward from f = 3 to f = 4, holding the extracted parameters constant. 
This is repeated multiple times in order to yield multiple simulated networks. The last 
simulation mini-step between t = 3 and t = A (based on unconditional estimation) is used 
to predict the actual network at t = 4. Both procedures, the SAOM and the TERGM 
out-of-sample prediction, are thus comparable. 

The actual comparison between the observed network and the distribution of simu¬ 
lated networks for the same time step is carried out in two ways: first, we assess the 
goodness of fit by comparing several simulated network statistics to their observed coun¬ 
terparts, for example the distribution of geodesic distances or the distribution of edge¬ 
wise shared partners ( [Hunter, Goodreau and Handcock 2012). If these distributions are 
similar to the observed distributions, we can be confident that the network-generating 
process is captured by the model. We compare this £t across the different types of models 
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(SAOM vs. TERGM). Second, we assess the classification performance of the simulations 
produced by each model using receiver operating characteristics (ROC) and precision- 
recall (PR) curves and by computing the area under the curve (AUC) for each of the two 
curves (Davis and Goadrich 2006 Hanley and McNeil 1982 Sing et ah 2005). In other 


words, we assess the extent to which dyadic states (edge or no edge between each i and j 
vertex) are predicted accurately by the simulations. More specifically, these procedures 
provide an intuitive understanding of sensitivity and specificity, or type I and type II 
error, or the true positive rate and the true negative rate, for the prediction of edges in 
the network. 


3.4 Evaluation of Predictive Fit using Simulated Data 

For our first test of the predictive performance of the SAOM and the TERGM, we cre¬ 
ate artificial network data under a known data-generating process and then fit the same 
model to the networks that was used to create the data in the first place. This does not 
constitute a universal test of the predictive performance of the two models because there 
is an infinite set of possible model specifications one could compare. Yet, if one of the 
models outperforms the other model on a relatively simple specification across a range of 
different parameters, this will be a strong indicator of differential predictive performance 
in general. We choose two simple data-generating processes that are maximally compat¬ 
ible with the TERGM and the SAOM, respectively. They are based on identical, simple 
endogenous network statistics and differ only in their updating processes. 

First, we independently draw four parameters from a uniform distribution ranging 
from —3 to -|-3 at steps of 0.001. These are typical parameter values one could find 
in an empirical SAOM or TERGM. Naturally, some combinations of these parameters 
(e.g., extreme combinations where all four parameters are close to —3) may lead to 
degenerate simulations, i.e., full or empty networks. If we encounter such nearly full 
or empty networks, we drop the parameters and sample a new set of parameters until 
they do not yield degenerate networks anymore. In practice, we drop parameters if they 
lead to initial networks with a density smaller than 0.03 or larger than 0.97. Indeed, the 
mean density of the simulated networks is 0.29 with a standard deviation of 0.22 for the 
SAOM process, and 0.33 with a standard deviation of 0.14 for the simulated data from 
the TERGM process. 

Second, the first three of the four parameters are used to simulate a series of six 
networks with 20 vertices based on three model terms (and the three sampled parameters 
attached to them): a baseline edges term 

hedges ^ ^ ; (^ 0 ) 

(or density in the case of the SAOM), a reciprocity term, 

= ( 11 ) 

and a transitive triplets statistic 

htra,ns = '^2 NijNikNjk. ( 12 ) 

i,j,k 
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Note that the i index is removed from the sum in each case for the SAOM in order to 
account for the actor-oriented perspective. 

One of the two simulation experiments—the TERGM process—uses MCMC and an 
ERGM formula (see Equation]^ based on the three sampled parameters to create a single 
network. Based on this simulated network, a series of hve new, consecutive networks (time 
steps f = 2 to f = 6) is simulated forward using the same three terms as for the first 
network and an additional dyadic stability term 


ij 


Nm 




t-i 


(13) 


with the fourth sampled parameter value attached to it. 

For the SAOM process, an empty network is created, and the SAOM actor selection 
and edge updating process is iterated five times to obtain networks from f = 2 to f = 6. 
For the rate-of-change function (Equation]^, a parameter value of 40 is chosen, which 
means that each actor is expected to re-consider his or her local network conhguration 
40 times per time unit ( Stacitfeld||2015[ ). 

In both simulation experiments, the hrst network in the list is removed, and the hve 
networks that are based on the same edges, reciprocity, and transitive triplets parameters 
but different treatments of the updating process are retained for further analysis. In 
either case, the result of this procedure is a series of hve networks which are serially 
correlated and based on several simple endogenous statistics. 100 such series of networks 
are independently created according to these rules in each experiment, and a SAOM 
and a TERGM are applied to each series of networks in each experiment. Each of these 
models contains an edges, reciprocity, and a transitive triplets model term. While the 
TERGM contains an additional edge stability term (see Equation [I^ for modeling the 
temporal updating, the SAOM uses its rate-of-change function and the objective function 
to model change between the networks. Thus these models should be able to capture 
the respective original data-generating process well—unless their inherent assumptions 
limit their ability to do so. Moreover, the generality of each model can be assessed by 
cross-checking the model fit against the other data-generating process. 

We simulate 100 sets of five consecutive networks for each procedure, estimate 100 
SAOMs and 100 TERGMs, simulate 10 new networks from each of the 100 respective 
models out of sample, and then assess the predictive fit (as described in Section 3.3). In 


both cases, the simulations are compared against the actual last network from the series 
using the ROG and PR curves. To quantify the predictive performance of each model, the 
area under the curve (AUG) is computed for both curves. In each simulation experiment, 
this results in 100 AUG-ROG measures and 100 AUG-PR measures for the SAOM and the 
same quantities for the TERGM. A two-sample t test for ROG (f(193) = 5.77, p < .001) 
and PR (f(193) = 3.49, p < .001) reveals that the £t of the TERGM models is better than 
the fit of the SAOM on average in the TERGM experiment while there is no significant 
difference in fit between the two models in the experiment with data simulated from 
the SAOM process. To see this more clearly. Figure [T] shows boxplots that visualize the 
distribution of all four AUG measures per simulation experiment as well as the average 
differences between models 
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Figure 1: Comparison of the area under the curve (AUC) for receiver operating characteristics (ROC) 
and precision-recall (PR) curves for SAOM and TERGM. Simulations from a data-generating process 
that is maximally compatible with the TERGM (on the left) and the SAOM (on the right). 


The results indicate that the TERGMs usually outperform the SAOMs in the hrst exper¬ 
iment, irrespective of whether non-edges are part of the performance measure (ROC) or 
not (PR), as demonstrated by the fact that most of the “diff” distribution is positive in 
both cases. There are, however, also a few cases where the SAOM fits better. Future re¬ 
search should investigate whether this is due to a low consistency of the estimator or due 
to a systematic pattern. In the second experiment, where the data-generating process of 
the SAOM is precisely modeled, there is no significant difference between performances 
of the models. 

While this comparison focuses on the actual location of edges in the network, another 
important feature is whether a model hts well in terms of its endogenous properties. 
This aspect of the goodness of £t can be checked by comparing auxiliary statistics like 
the shared partner distribution, the degree distribution, and the distribution of geodesic 
distances between the originally observed network and the out-of-sample simulations of 
the model for the same time step. In order to compare the 100 SAOMs and TERGMs per 
study, we plot the difference between the absolute deviation of the median TERGM sim¬ 
ulation from the original network and the absolute deviation of the median SAOM from 
the original network. If denotes the original observation of an endogenous statistic (for 
example, how many dyads in the network have exactly four dyadwise shared partners?) 
for the s’th simulation, hg denotes the median of same count for the s’th TERGM, and 
Cs denotes the median for the SAOM, then 
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Figure 2: TERGM process: endogenous fit. Simulations from a data-generating process that 
is maximally compatible with the TERGM. Comparison of the SAOM and the TERGM based on en¬ 
dogenous network properties. The boxplots show the difference between the absolute deviation of the 
TERGM from the original data and the absolute deviation of the SAOM from the original data. Dis¬ 
tributions above 0 indicate a better performance of the SAOM, and distributions below 0 indicate a 
better performance of the TERGM. In the first, second, and hfth panel, results tend to be in favor of 
the TERGM; in the other panels, the models are on par. 


is the difference in deviations from the original network for a given statistic. Values 
above 0 reflect a greater deviation from the true value for the TERGM, and values below 
0 reflect a greater deviation from the true value for the SAOM. Figure uses boxplots to 
visualize the distributions of these values for the TERGM process. There is no noticeable 
difference between the two models. The same pattern can be found when the SAOM 
process is considered. As the diagrams for the two processes are nearly identical, the 
SAOM comparison is reported in the Online Appendix. 

The preliminary main finding here is that the two models seem to capture endogenous 
structure equally well irrespective of context while the TERGM more reliably recovers the 
position of edges in the network when a process maximally compatible with the TERGM is 
at work. Thus it is fair to say that both models arrive at the same conclusions regarding 
the topology of the network, but they differ in their assessment of which vertices are 
involved in the respective edges. As a caveat, we should note that this comparison is 
only based on a subset of the differences enumerated above. For example, strategic action 
or the size limitation were not tested for detrimental effects on model performance, so we 
cannot rule out that endogenous performance differs as well in some applications. The 
next section introduces such a case. 
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Even if an exhanstive answer abont predictive performance cannot be given dne to 
the infinite set of possible models one conld test, this simulation exercise is simple and 
plausible enough to provide a hrst hint that the two models may differ in a range of 
applications where the data-generating process is not perfectly in line with the specihc 
assumptions of the respective model. This underlines the need for comparative goodness- 
of-fit assessment in empirical applications to justify model selection. 


4 Real-Data Example: Replication of a Friendship 
Network SAOM Based on the Knecht Dataset 

So far, we have established that there are theoretical differences between the updating 
processes of the TERGM and the SAOM, with a more specihc process posited by the 
SAOM. Even in a simple simulation exercise, these differences have been shown to be 
consequential for the inferential performance of the SAOM, at least with regard to edge 
prediction, but we do not know yet which specihc aspects cause these diherences. Out- 
of-sample prediction is a useful tool for comparing the performance of the two models on 
a given (simulated or empirical) dataset, though this certainly does not replace the role 
of theory in guiding model selection. 

The remainder of this article is dedicated to these further points: Sometimes theory 
is not sufficient to evaluate a priori if the assumptions of the SAOM are reasonable in 
a given context or if the TERGM should be used instead, as further research is needed 
on whether and how the theoretical diherences translate into diherent empirical perfor¬ 
mance; even in apparently well-studied empirical contexts (such as the example provided 
below), the specihc updating posited by the SAOM may not be at work, leading to a 
better performance of the TERGM. Therefore, when in doubt, users can easily use our 
companion software to assess the out-of-sample performance of both models. This will 
contribute in the long run to our collective understanding of the contexts in which one 
model should be chosen over the other model. Moreover, we will demonstrate by example 
that empirical cases exist where endogenous model ht dihers out of sample between the 
two models (in contrast to the simulations, where edge classihcation made a diherence). 

To make these points, we illustrate the performance-based comparison of the SAOM 
and the TERGM by replicating an analysis reported by |Snijders, van de Bunt and Steglich 
(2010). Substantively, the authors model a friendship network in a Dutch school class over 


several time steps. The data were originally collected by Knecht (2006, 2008) and later 


used as the primary example for introducing stochastic actor-oriented models (Snijders, 


van de Bunt and Steglich 2010). According to what we believe we know a priori about 


friendship networks, the SAOM should be well suited for modeling the evolution of friend¬ 
ship because the updating assumptions as set out above seem to be reasonable in this 
context. The SAOM was originally designed with the application of friendship network 
evolution in mind, and friendship networks are still one of the primary fields of applica¬ 
tion. Furthermore, the application we consider is a frequent example used in expository 
papers, workshops, and tutorials for the SAOM. The application has been studied ex¬ 
tensively with the SAOM and is “standard” in its literature. Therefore this empirical 
application serves as a critical test case that is tailored maximally to the assumptions of 
the SAOM. For example, more specifically, one could argue that it is reasonable to think 
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that actors do in fact change their relations one at a time without coordination and that 
actors will alter their edges to “improve” their position in the network. 

The friendship networks under study were measured between September of 2003 and 
June of 2004. The data consist of 26 students in their hrst year of secondary school 
and were recorded in four waves over the period of study. The subjects, 17 girls and 9 
boys aged 11-13, were asked to nominate up to twelve classmates they considered “good 
friends.” 

To address the fact that friendship networks tend to be reciprocal, sex-segregated. 


and show tendencies towards triadic closure, Snijders, van de Bunt and Steglich (2010) 


include reciprocity (see Equation [^, transitive triplets. 


transitive edges. 


ht-trip / ^ 

j,k 


^t-edges ^ jk') ^ 


(16) 


(17) 

(18) 


three-cycles, 

he = ^ NijNjkNki, 

j,k 

and an indicator for when friendship edges are sexually homophilous, thus capturing sex 
segregation: 

“ (19) 




^same-sex — / . 

3 


Furthermore, they include the following degree-based measures as controls: indegree 
popularity 

^in-pop = 'y ^ 1 (20) 


out-degree popularity 


^out-pop (21) 

( 22 ) 

as well as exogenous indicators for male respondents (as a sender effect and as a receiver 
effect) and friendship in primary school. 

Table [T] shows the results of a faithful re-analysis of their “Model 0”, the most elab- 


out-degree activity 


hout-act = 


orate model without actor-behavior co-evolution reported by Snijders, van de Bunt and 


Steglich (2010), and a TERGM with the same specification. While the model terms are 


faithful, the estimates reported in Table are based on a reduced dataset without the 
last time step in order to permit out-of-sample prediction. To get as close as possible 
to the SAOM with its rate-of-change function and mini-steps, a dyadic stability term 
(Equation is added to the TERGM, which accounts for the inertia of both edges and 
non-edges in the data-generating process. While a SAOM models the change between 
time steps t — 1 and t, a TERGM models the state of the network at t, and therefore 
the change between t — 1 and t can be introduced into the model by conditioning on the 
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SAOM 

TERGM 

Density/edges 

-1.49 (0.56)” 

-10.45 (0.95)*** 

Reciprocity 

1.42 (0.29)”* 

2.43 (0.39)*** 

Transitive triplets 

0.38 (0.07)”* 

0.06 (0.05) 

Cyclic triplets 

-0.27 (0.12)* 

-0.49 (0.13)*** 

Transitive edges 

0.55 (0.26)* 

0.14 (0.21) 

Indegree of alter (sqrt) 

-0.05 (0.21) 

1.44 (0.24)*** 

Outdegree of alter (sqrt) 

-0.53 (0.29) 

-0.02 (0.13) 

Outdegree of ego (sqrt) 

-0.03 (0.11) 

1.81 (0.21)*** 

Same primary school 

0.47 (0.17)** 

0.65 (0.26)* 

Male (alter) 

-0.05 (0.15) 

0.29 (0.24) 

Male (ego) 

0.28 (0.16) 

0.61 (0.24)** 

Male (match) 

0.66 (0.16)*** 

1.81 (0.27)*** 

Rate parameter period 1 

9.43 (1.84)*** 


Rate parameter period 2 

9.45 (1.94)*** 


Dyadic stability 


1.09 (0.13)*** 


***p < 0.001, **p < 0.01, *p < 0.05 


Table 1: Re-analysis of “Model 0” using SAOM and TERGM, based on the first three time steps. 


edges and non-edges of the network at t — 1 via the dyadic stability term. As the same 
information (the previous and current networks) and the same number of parameters (the 
parameters for the model terms and the rate parameter versus the stability term) enter 
both models, there is no reason to expect a priori that either of the two models should 
predict the last time step more successfully than the other model with an identical set 
of model terms, unless the assumptions the respective model makes do not capture the 
true data-generating process. 

The results differ substantially between the two models. The coefficients are scaled 
differently and so should not be compared directly, but in many cases even the direction 
and signihcance of estimates is reversed. In particular, transitive triplets, transitive edges, 
indegree of alter, and outdegree of ego yield substantively different conclusions. If these 
effects were of substantive interest to the researcher, this divergence would be alarming. 
Furthermore, the estimates of both models are roughly comparable with the estimates 
gathered from identical models applied to all four time points. This indicates that the 
data-generating process does not change between the hrst three waves and the last wave, 
which is used for out-of-sample prediction. 

Although we could try to tweak the two models and deviate from the model suggested 
by the original authors in order to improve model £t altogether, this would make our 
analysis vulnerable to criticism because doing so may favor one of the models over the 
other. Therefore we try to remain as impartial as possible by re-using the theoretical 
specihcation proposed by Snijders, van de Bunt and Steglich (2010). 

We use out-of-sample prediction of time step 4 to assess which model hts the data 
best. To do so, we estimate the models without information on the network at the last 
time step (as reported in Table [^, simulate 100 new networks in lieu of the last time 
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Figure 3: Out-of-sample fit according to auxiliary network statistics. 


step and based on the model, and compare these simulations to the observed network 
at the fourth time step. Figure presents boxplots of endogenous network statistics 
(“auxiliary statistics”) in the simulated networks (as shown by the gray boxplots) versus 
the observed network (indicated by a black line). This approach focuses on the structure 
of the network irrespective of the ordering of the actual vertices. Very good £t occurs 
when the observed data intersect the medians of the predicted data. Figure shows the 
ROC and PR curves as well as the area under these curves as complementary measures 
of the goodness of £t. These measures take into account the ordering of the actual 
vertices and assess classihcation performance of the models, that is, how many dyads are 
predicted correctly by the simulations? 

The comparison of the boxplots across the two models reveals that the TERGM hts 
substantially better than the SAOM in this case. With all four auxiliary statistics, the 
simulated networks capture the observed network much more closely. Overall, there is 
some small room for improvement even in the TERGM, but the fit seems adequate for 
an out-of-sample prediction. 

The ROC and PR comparison also conhrms that the TERGM has a better out-of- 
sample model fit, by a large margin. The TERGM outperforms the SAOM presented 
here signihcantly regarding the state of the specihc dyads, no matter whether non-edges 
are part of the prediction task (as in the ROG curve) or not (as in the PR curve). 

Three lessons can be learned from this replication exercise. First, even in a social 
context around which the SAOM was originally designed, the TERGM outperforms the 
SAOM with regard to predictive performance. This is particularly noteworthy because 
both models were parametrized as similarly as possible. The only differences are the core 
assumptions inherent in the two models: the rate function and the mini-step updating 
process of the SAOM versus the dyadic stability term added to the TERGM. Just like the 
updating process in the SAOM, the dyadic stability term in the TERGM is a very specihc 
way of modeling time dependencies. Yet the TERGM is considerably more hexible as to 


27 


















Receiver-operating characteristics 


Precision-recaii curve 


Area under the ROC and PR curves 
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Figure 4: Out-of-sample fit according to the receiver operating characteristic (ROC) curve (left panel) 
and precision-recall (PR) curve (center panel), and aggregate comparison of the area under the curve 
(AUC) for all models and curves. 


what kinds of temporal dependencies can be added shonld the data-generating process 
turn out to be of a different kind (e.g., edge innovation or loss, autoregression, delayed 
reciprocity etc.). Here, merely conditioning on the previous time step explicitly yields a 
better prediction of the successive time step than Markovian mini-step updating from the 
previous time step onwards. However, it is not clear whether this is the only or even the 
decisive mechanism leading to different model fit as the SAOM is built around a range 
of specific assumptions any of which could have led the updating process astray. 

Second, even though we had no reason to cast doubt on the reasonableness of the 
SAOM updating process in the case of friendship networks a priori, it turned out that 
the estimates and simulations produced by the TERGM are more closely aligned with 
the oberved network. This is somewhat surprising and leads us to conclude that a pri¬ 
ori model choice cannot always be informed by theory alone, at least given our current 
knowledge as to what is causing the limitations of the SAOM in this specihc case. Can¬ 
didates were enumerated above in the theoretical section. This underlines, once more, 
the need for more focused simulation experiments on each of the differences between the 
models separately. 

And third, since theory alone cannot be evoked at the present time to choose one 
model over the other, we suggest the use of out-of-sample prediction in any given ap¬ 
plication to inform model choice for the time being. Our companion software supplied 
in the Online Appendix makes such comparisons trivial. If many such comparisons are 
reported, this may soon lead to a better collective understanding of the contexts in which 
the TERGM is indeed to be preferred over the SAOM and vice versa and therefore provide 
complementary evidence to the simulation experiments suggested above. 

5 Conclusion 

We have shown that the TERGM and SAOM share a similar mathematical core, but 
differ with respect to several important assumptions about how the network evolves 
between observations. More specihcally, the SAOM posits a particular process by which 
an observed network at f — 1 transitions into the observed network at t. This process 
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involves two stochastic sub-processes in which (a) vertices are selected for a chance to 
update their edge prohles and (b) the choice to change (make a new or dissolve an 
existing) an edge or not is made. The TERGM, conversely, is less specihc. Though the 
TERGM is potentially consistent with a broader set of dynamic network processes, the 
SAOM could be though more appropriate in cases where its updating process is consistent 
with theory. 

We conducted two comparative analyses of the TERGM and SAOM. In the hrst, 
we simulated data from a simple, entirely endogenous data-generating process. This 
was an instructive exercise because the ground truth was known, and we could judge 
clearly which models recovered coefficients closest to the actual data generating process. 
Here, we found that the TERGM out-performed the SAOM by a substantial margin with 
regard to edge prediction while there was no difference in endogenous model £t. Second, 
we considered a real-data example that maximally conforms to the updating process and 
application area for which the SAOM was designed: a friendship network of students in 
a Dutch school. In the real-data application, we replicated a SAOM that is commonly 
used in RSiena documentation and workshops. We found that, when considering out- 
of-sample predictive performance, the TERGM again outperformed the SAOM by a 
substantial margin, both with regard to endogenous structures and the location of edges 
in the network. 

From a theoretical perspective, a very tentative conclusion we might reach, given 
the core similarity of the two models but the detailed updating process proffered by the 
SAOM, is that the SAOM should perform well when its core assumptions are met. This 
is supported by the simulation study where the SAOM and the TERGM could recover 
the SAOM DGP equally well but the two models differed with respect to the TERGM 
DGP. This conclusion, though, is in tension with our real-data example: indeed, even 
on the type of data for which the SAOM was designed and which the developers of 
the SAOM use as an expository case, the TERGM out-performed the SAOM. Yet it is 
not the case that the TERGM always outperforms the SAOM: a small fraction of the 
simulations were apparently more in line with the DGP of the SAOM (while the majority 
of the simulations could be better predicted by the TERGM). Our conclusion thus must 
be that the need of the SAOM to have its updating assumptions met with a very high 
degree of precision is essential in order for that specific model to outperform the more 
general TERGM. 

A second, and far more pragmatic, conclusion is that an empirical comparison can be 
conducted whenever it is not possible to choose the SAOM or the TERGM on theoretical 
grounds. Because theoretical conformity to the data-generating process of the SAOM 
did not yield predictable results, we must be careful not to put too much stock into 
the a priori selection of a model. Rather, as it is straightforward to contrast the out- 
of-sample (or in-sample) predictive performance of the two models—the xergm package 
has easy-to-use functions for making such comparisons—, the prudent researcher may be 
uncontroversially advised to do so. 

Third, we have only touched the surface of this topic by contrasting carefully selected 
extreme cases. Much more work needs to be done in order to carve out which of the 
specific parts of the models lead to the differences we could observe. Our theoretical 
section offers a number of suggestions on factors that could make a difference. Future 
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research will need to evaluate these factors as impartially as possible using simulation 
experiments and possibly carefully selected case studies. 
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A Online Appendix 


A.l Endogenous Out-of-Sample Fit for the SAOM Experiment 
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Figure 5: SAOM process: endogenous fit. Simulations from a data-generating process that is 
maximally compatible with the SAOM. Comparison of the SAOM and the TERGM based on endogenous 
network properties. The boxplots show the difference between the absolute deviation of the TERGM from 
the original data and the absolute deviation of the SAOM from the original data. Distributions above 0 
indicate a better performance of the SAOM, and distributions below 0 indicate a better performance of 
the TERGM. The two models are on par in this DGP. 


A.2 Alternative Memory Terms for the TERGM 


As memory terms in TERGMs have not received extensive treatment outside of Leifeld, 


Cranmer and Desmarais (2016a), the reader may wish to consider several example mem¬ 
ory terms beyond what we discussed in the main text. The memory terms considered 


below are based on those in the extensive discussion from Leifeld, Cranmer and Des¬ 


marais 


(2016a) and are by no means exhaustive, as a memory term can include arbitrary 


functions of time. 

Several intuitive and convenient memory terms are as follows: 


1. Positive autoregression (lagged outcome network): ha = 

This memory term adds value to the statistic any time an edge persists from one 
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period to the next. Note that it can be modihed by adding dependencies on net¬ 
works further removed than K = 1. Note, too, that, as formulated above, this is 
equivalent to adding a lagged network as a covariate in the TERGM. 

2. Dyadic stability: hs = + (1 - ^p)(l " 

Dyadic stability is a straightforward extension of positive autocorrelation, but ac¬ 
counting for positive autocorrelation in nonexistent edges as well as existing ones. 
This simple statistic was also discussed in the main text. 

3. Edge innovation/loss: he = — Nl~^) (innovation) and hi = — 

(loss). 

These memory terms focus on the formation of new edges (innovation) or the dis¬ 
solution of existing edges (loss) between time periods. 

4. Delayed reciprocity (ego delays): ha = 

This statistic skirts the edge between a memory term and an endogenous depen¬ 
dency. It is a reciprocity statistic in which the ego reciprocates only after the alter 
has formed an edge to it. An alter-delays version may be created by swapping the 
order of i and j in the subscripts. 
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